当前的半监督视频对象分割(VOS)方法通常利用一个框架的整个功能来预测对象掩码和更新内存。这引入了重要的冗余计算。为了减少冗余,我们提出了一种区域意识到的视频对象细分(RAVOS)方法,该方法可预测感兴趣的区域(ROI),以进行有效的对象细分和内存存储。 Ravos包括一个快速对象运动跟踪器,可以在下一个帧中预测其ROI。为了有效的分割,根据ROI提取对象特征,并且对象解码器设计用于对象级分割。为了有效的内存存储,我们建议运动路径内存来通过记住两个帧之间对象的运动路径中的特征来滤除冗余上下文。除了Ravos,我们还提出了一个称为OVO的大型数据集,以基准在遮挡下基准VOS模型的性能。对戴维斯和YouTube-VOS基准和我们的新OVOS数据集的评估表明,我们的方法以更快的推理时间来实现最先进的性能,例如,戴维斯的42 fps的86.1 J&F在YouTube-in YouTube-in YouTube-in YouTube-in YouTube-23 fps上达到42 fps- VOS。
translated by 谷歌翻译
每日操纵任务的特征是与动作和对象形状相关的几何基原始人。这样的几何描述符仅通过使用笛卡尔坐标系统而差异很差。在本文中,我们提出了一种学习方法,以从坐标系词典中提取最佳表示,以编码观察到的运动/行为。这是通过在Riemannian歧管上使用高斯分布的扩展来实现的,该分布用于通过将多个几何形状作为任务的候选表示来分析一组用户演示。我们根据迭代线性二次调节器(ILQR)提出了复制问题作为一般最佳控制问题,其中使用提取的坐标系中的高斯分布来定义成本函数。我们将方法应用于模拟和7轴Franka Emika机器人中的对象抓握和箱式打开任务。结果表明,机器人可以利用几个几何形状来执行操纵任务并将其推广到新情况下,通过维护感兴趣的坐标系中任务的不变特征。
translated by 谷歌翻译
Jaccard索引,也称为交叉联盟(iou),是图像语义分段中最关键的评估度量之一。然而,由于学习目的既不可分解也不是可分解的,则iou得分的直接优化是非常困难的。虽然已经提出了一些算法来优化其代理,但没有提供泛化能力的保证。在本文中,我们提出了一种边缘校准方法,可以直接用作学习目标,在数据分布上改善IOO的推广,通过刚性下限为基础。本方案理论上,根据IOU分数来确保更好的分割性能。我们评估了在七个图像数据集中所提出的边缘校准方法的有效性,显示使用深度分割模型的其他学习目标的IOU分数大量改进。
translated by 谷歌翻译
兴趣点检测是计算机视觉和图像处理中最根本,最关键的问题之一。在本文中,我们对图像特征信息(IFI)提取技术进行了全面综述,以进行利益点检测。为了系统地介绍现有的兴趣点检测方法如何从输入图像中提取IFI,我们提出了IFI提取技术的分类学检测。根据该分类法,我们讨论了不同类型的IFI提取技术以进行兴趣点检测。此外,我们确定了与现有的IFI提取技术有关的主要未解决的问题,以及以前尚未讨论过的任何兴趣点检测方法。提供了现有的流行数据集和评估标准,并评估和讨论了18种最先进方法的性能。此外,还详细阐述了有关IFI提取技术的未来研究方向。
translated by 谷歌翻译
In this paper, we study the problem of knowledge-intensive text-to-SQL, in which domain knowledge is necessary to parse expert questions into SQL queries over domain-specific tables. We formalize this scenario by building a new Chinese benchmark KnowSQL consisting of domain-specific questions covering various domains. We then address this problem by presenting formulaic knowledge, rather than by annotating additional data examples. More concretely, we construct a formulaic knowledge bank as a domain knowledge base and propose a framework (ReGrouP) to leverage this formulaic knowledge during parsing. Experiments using ReGrouP demonstrate a significant 28.2% improvement overall on KnowSQL.
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
Dynamic treatment regimes assign personalized treatments to patients sequentially over time based on their baseline information and time-varying covariates. In mobile health applications, these covariates are typically collected at different frequencies over a long time horizon. In this paper, we propose a deep spectral Q-learning algorithm, which integrates principal component analysis (PCA) with deep Q-learning to handle the mixed frequency data. In theory, we prove that the mean return under the estimated optimal policy converges to that under the optimal one and establish its rate of convergence. The usefulness of our proposal is further illustrated via simulations and an application to a diabetes dataset.
translated by 谷歌翻译
Nowadays, time-stamped web documents related to a general news query floods spread throughout the Internet, and timeline summarization targets concisely summarizing the evolution trajectory of events along the timeline. Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order. To tackle this challenge, in this paper, we propose a Unified Timeline Summarizer (UTS) that can generate abstractive and extractive timeline summaries in time order. Concretely, in the encoder part, we propose a graph-based event encoder that relates multiple events according to their content dependency and learns a global representation of each event. In the decoder part, to ensure the chronological order of the abstractive summary, we propose to extract the feature of event-level attention in its generation process with sequential information remained and use it to simulate the evolutionary attention of the ground truth summary. The event-level attention can also be used to assist in extracting summary, where the extracted summary also comes in time sequence. We augment the previous Chinese large-scale timeline summarization dataset and collect a new English timeline dataset. Extensive experiments conducted on these datasets and on the out-of-domain Timeline 17 dataset show that UTS achieves state-of-the-art performance in terms of both automatic and human evaluations.
translated by 谷歌翻译
Hybrid unmanned aerial vehicles (UAVs) integrate the efficient forward flight of fixed-wing and vertical takeoff and landing (VTOL) capabilities of multicopter UAVs. This paper presents the modeling, control and simulation of a new type of hybrid micro-small UAVs, coined as lifting-wing quadcopters. The airframe orientation of the lifting wing needs to tilt a specific angle often within $ 45$ degrees, neither nearly $ 90$ nor approximately $ 0$ degrees. Compared with some convertiplane and tail-sitter UAVs, the lifting-wing quadcopter has a highly reliable structure, robust wind resistance, low cruise speed and reliable transition flight, making it potential to work fully-autonomous outdoor or some confined airspace indoor. In the modeling part, forces and moments generated by both lifting wing and rotors are considered. Based on the established model, a unified controller for the full flight phase is designed. The controller has the capability of uniformly treating the hovering and forward flight, and enables a continuous transition between two modes, depending on the velocity command. What is more, by taking rotor thrust and aerodynamic force under consideration simultaneously, a control allocation based on optimization is utilized to realize cooperative control for energy saving. Finally, comprehensive Hardware-In-the-Loop (HIL) simulations are performed to verify the advantages of the designed aircraft and the proposed controller.
translated by 谷歌翻译
Due to their ability to offer more comprehensive information than data from a single view, multi-view (multi-source, multi-modal, multi-perspective, etc.) data are being used more frequently in remote sensing tasks. However, as the number of views grows, the issue of data quality becomes more apparent, limiting the potential benefits of multi-view data. Although recent deep neural network (DNN) based models can learn the weight of data adaptively, a lack of research on explicitly quantifying the data quality of each view when fusing them renders these models inexplicable, performing unsatisfactorily and inflexible in downstream remote sensing tasks. To fill this gap, in this paper, evidential deep learning is introduced to the task of aerial-ground dual-view remote sensing scene classification to model the credibility of each view. Specifically, the theory of evidence is used to calculate an uncertainty value which describes the decision-making risk of each view. Based on this uncertainty, a novel decision-level fusion strategy is proposed to ensure that the view with lower risk obtains more weight, making the classification more credible. On two well-known, publicly available datasets of aerial-ground dual-view remote sensing images, the proposed approach achieves state-of-the-art results, demonstrating its effectiveness. The code and datasets of this article are available at the following address: https://github.com/gaopiaoliang/Evidential.
translated by 谷歌翻译